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Abstract. According to a popular family of hypotheses, crossmodal matches between distinct 
features hold because they correspond to the same polarity on several conceptual dimensions (such 
as active-passive, good-bad, etc.) that can be identified using the semantic differential technique. 
The main problem here resides in turning this hypothesis into testable empirical predictions. In the 
present study, we outline a series of plausible consequences of the hypothesis and test a variety of 
well-established and previously untested crossmodal correspondences by means of a novel internet- 
based testing methodology. The results highlight that the semantic hypothesis cannot easily explain 
differences in the prevalence of crossmodal associations built on the same semantic pattern (fast 
lemons, slow prunes, sour boulders, heavy red); furthermore, the semantic hypothesis only minimally 
predicts what happens when the semantic dimensions and polarities that are supposed to drive 
such crossmodal associations are made more salient (e.g., by adding emotional cues that ought to 
make the good/bad dimension more salient); finally, the semantic hypothesis does not explain why 
reliable matches are no longer observed once intramodal dimensions with congruent connotations 
are presented (e.g., visually presented shapes and colour do not appear to correspond). 

Keywords: crossmodal correspondences, internet-based testing, intramodal correspondences, semantic 
hypothesis, semantic differential technique, sound symbolism. 

1 Introduction 

The last few years have seen a rapid growth of interest in the study of crossmodal correspondences (see 
Spence, 2011 , 2012 , for reviews). The term itself is one of many (see Spence, 2011 , for various others) 
that have been used by researchers over the years in order to describe the fact that neurologically nor- 
mal human observers appear to match objects, features, or dimensions of experience across sensory 
modalities, including in cases where they do not seem to be regularly co-experienced in the environ- 
ment (see Deroy, Crisinel, & Spence, in press ; Spence & Deroy, 2012 ). Initially, such intuitive cross- 
modal matches often strike us as surprising, not to say arbitrary. When, for example, people are asked 
to match non-words like "Takete" or "Kiki," and "Maluma" or "Bouba" to angular and rounded shapes 
(Kohler, 1929 , 1947 , Ramachandran & Hubbard, 2001 ), they respond consistently, matching "Takete" 
and "Kiki" to the angular shape, and "Maluma" and "Bouba" to the rounded one. When asked which 
colour is heavier, red or yellow, people consistently tend to say that red is the heavier of the two 
(Alexander & Shansky, 1976 ); they also intuitively associate brighter surfaces with higher pitched 
sounds, and darker ones with lower pitched sounds (Ludwig, Adachi, & Matzuzawa, 2011 ; see also 
O'Mahony, 1983 ). 

One obvious limitation with many of these subjective questions, tested on only a limited number 
of participants, is that they do not reveal what drives/underlies the crossmodal matches. Over the years, 
several different explanations have been put forward, including the popular semantic hypothesis inspired 
by work on the semantic differential technique popularized by Osgood, Suci, and Tannenbaum ( 1957 ; see 
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also Osgood, 1960 ; Snider & Osgood, 1969 ). The idea here is that the presented objects are categorized 
along a certain number of common dimensions (such as active/passive; good/bad; dominant/submissive, 
etc.) and match if they fall in the same dimensional region as one another along some number of these 
dimensions. For instance, bright and high pitch are both "active," and therefore considered as more intui- 
tively congruent than bright and low pitch. A similar hypothesis has also been adopted very recently by 
Walker and Walker ( 2012 ) and defended, albeit without commitment to specific conceptual dimensions, 
by Martino and Marks ( 1999 , 2001 ; see also Palmer, Schloss, Xu, & Prado-Leon, 2013 ). 

The lack of specificity, and the plurality of dimensions at stake, makes it feasible to criticize the 
semantic hypothesis as being possibly rather ad hoc: It is, after all, possible to find a common feature 
for any two objects one can think of, and then reconstruct it as the dimension that governed their 
matching. It is also possible that the participants in such studies may try to guess what sort of con- 
ceptual association the experimenter(s) had in mind when selecting a series of stimuli, making the 
effect an artifact of the forced-choice task more than the test of a pre-existing intuition (e.g., Pratt, 
1990 ; Waterman, Blades, & Spencer, 2000 ). The goal of the present study was therefore to explore 
ways in which the semantic hypothesis could be put to the test empirically. 

A first test here, inspired by the title of an article by Brown ( 1958 ), was to find some crossmodal 
matches which are semantically sound, and yet do not give rise to consistent intuitions. In the title of 
his review of Osgood et al.'s ( 1957 ) book on the semantic differential technique, Brown, for instance, 
considered the prosaic nature of a question like "Is a boulder sweet or sour?" Here, we tested the 
generality of this, and other crossmodal correspondences (such as that lemons are fast and that red 
is heavier than yellow), by utilizing an internet-based testing methodology which was trialed on the 
occasion of this study. We also tested a number of other similar associations involving non-relative 
or qualitative features such as "sweet-sour" that have frequently been endorsed in the literature (i.e., 
that lemons are fast and that red is heavier than yellow) to see whether the kind of effect obtained 
with boulders would generalize to these other cases. Similarly, we tested several other associations 
with selected features (sharp vs. round; red vs. yellow; rough vs. soft) that were already assigned a 
place on specific semantic dimensions by previous researchers/experiments to see whether or not 
they would generate intuitive matches that would be consistent with the semantic hypothesis. 

A related prediction, which the defenders of the semantic hypothesis have not explicitly explored 
at this point, concerns intramodal correspondences: If all objects and features can be categorized 
according to certain common dimensions, and if any two features associated with the same pole on a 
given dimension should be matched by participants — this should happen regardless of whether those 
features are taken from the same versus different sensory modality. In this sense, one would expect 
people to match active "angular" to active "yellow" or "rough" within the visual modality, as much as 
they match active "up" and "high pitch" across modalities (e.g., across vision, audition, and touch; see 
Evans & Treisman, 2010 ; Occelli, Spence, & Zampini, 2009 ). 

A final inference that can be drawn from the semantic hypothesis would appear to be that features 
that are easier to place in terms of certain crucial semantic dimensions (such as active/passive; good/ 
bad; dominant/submissive) should lead to more frequent or confident matches by participants. For 
instance, one can think that the success of the Bouba/Kiki effect (e.g., Bremner et al., 2013 ; Kohler, 
1929 , 1947 ; Ramachandran & Hubbard, 2005 ) can be attributed to the fact that the sounds and/or the 
figures are easy to place on a scale that is anchored by the labels "passive" and "active" or "bad" and 
"good." Here, we tested whether participants would find this placing easier (i.e., more intuitive) by 
adding cues of the same polarity to the shape (such as adding cues like "happy" to the sharp shape, 
to make it more active, dominant, and good) or more difficult by adding cues going in the opposite 
direction (such as sad to the sharp shape, to make it less active, dominant, or good). In the present 
study, we tested these three hypotheses in a block of trials using a randomized internet testing method, 
first in an English-speaking environment that was highly controlled (Experiment 1), and then in a 
much broader selection of participants from different parts of the world in a somewhat less controlled 
manner (Experiment 2), in order to examine any differences that might be attributable to a partici- 
pant's prior linguistic or cultural exposure (Eitan & Timmers, 2010 ). The consequences of the results 
obtained here for the semantic explanation of crossmodal matches are discussed below. 

2 Experiment 1: Crossmodal matches and semantic dimensions 

In Experiment 1 , we tested three sets of predictions related to the semantic hypothesis of crossmodal 
matching. First, we were interested in determining whether certain matches that would challenge this 
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explanation also existed — primarily those involving qualitative (that is, metathetic, see Smith & Sera, 
1992 ; Spence, 2011 ; Stevens, 1957 ) dimensions such as for hue or taste quality, or whole complex 
objects like boulders or fruits. Second, we wanted to test whether those features associated with the 
same polarity on specific dimensions would also be associated intramodally (e.g., shape and colours; 
colours and visually displayed textures). Third, we examined whether adding cues that made a certain 
polarity more salient on a particular semantically relevant dimension (e.g., happy for active sharpness) 
would lead to more confident or faster responding by participants on a classical crossmodal matching 
task ("Bouba-Kiki" being matched with round-angular shapes; Bremner et al., 2013 ; Kohler, 1929, 
1947 ). 

2.1 Methods 

2.1.1 Participants 

Eighty-one undergraduate students (71 female and 10 male) from York St. John University (UK) vol- 
unteered to take part in this study as part of a first year practical class. The participants ranged in age 
from 1 8 to 32 years (mean of 19.4 years). All of the participants provided informed consent prior to the 
study and the experimental protocol was approved by the University Ethics Committee. Participants 
reported their country of origin as United Kingdom (80) and Ireland (1). The average time taken to 
complete the study was 425 seconds (standard deviation 72 seconds). 

2.1.2 Stimuli 

Text descriptors and images were used as stimuli. The participants always had two response options 
(either text- or image-based) for each of the stimuli that were presented. The stimuli consisted of a 
subset presented in one configuration only ("boulder," "heavier," "prune," "yellow") and a subset of 
stimuli (blob, star, rectangle) that were varied parametrically across three factors (shape, colour, emo- 
tion; e.g., a blob that is red and has a smiley face). The stimuli and their possible response options 
available to participants are shown in Figure 1 . We selected a variety of non-words that have been 
shown to correspond to "Bouba" (Bouba, Maluma, Lomoro, Mamima, Muromu, Malomu) and "Kiki" 
(Kiki, Takete, Kiriki, Teziki, Kichiki, Kekiti) in previous studies of sound symbolism (see Nielsen & 
Rendall, 2011 ). These non- words were chosen randomly from each list as needed (one randomly cho- 
sen word from each list was presented five times and the remaining words six times each). The stimuli 
were displayed against a grey background (RGB 217, 217, 217). 

2.1.3 Apparatus 

The participants completed the experiment in one of four lab classes on PC computers. The 
experiment was conducted on the internet through the Adobe Flash based Xperiment software 
( http://www.xperiment.mobi downloaded on 15/10/12). Testing was conducted in parallel on a 
number of computers in the psychology laboratory. The monitor resolution was set at 1,024 X 768 
and had a screen size of 38.1 cm, corner to corner. All of the stimuli were clearly visible, each filling 
approximately 15% of the screen. 

2.1.4 Design 

A repeated measures design was used with all of the participants undertaking all of the experimental 
trials. The dependent variables were the response chosen (two possibilities), the reaction time (RT), 
and the confidence rating. 

2.1.5 Procedure 

The procedure was explained verbally to the participants by the experimenter and, at the start of the 
study, by means of an illustration (see Figure 2 ). The participants were instructed to follow the instruc- 
tions that were presented on the screen, to remain quiet after completing the study, and to initiate the 
study when they were ready. All of the experimental trials followed the same procedure. A question 
mark was always presented in the centre of the screen for the duration of each trial. The test stimulus 
was positioned immediately above the question mark. The participant dragged the test stimulus to one 
of the two possible response options displayed equidistant to the left and right of the midline in the 
lower half of the screen. This response was recorded automatically. Left- and right-option positioning 
was randomized across trials and participants, as was the order in which the trials were presented. The 
time from the appearance of the stimulus up until when the stimulus was "dropped in place" over a 
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Figure 1. Stimuli used in Experiment 1 together with the two response options, (a) The subset of stimuli was 
presented in a single configuration only (no parametric variation); (b) the subset of stimuli that were parametrically 
varied in terms of their shape, colour, and emotion. 

given response option by the participant was also recorded. Upon selecting a response, the participants 
were asked "How confident are you that other people will respond in the same way as you?" The par- 
ticipants made their responses on a five-point horizontal scale shown above the question mark with the 
following options, arranged from left to right: "very unconfident," "unconfident," "uncertain," "confi- 
dent," "very confident." After responding, a "continue" button appeared. Pressing this button cleared 
the screen and initiated the next trial after a 500 ms pause. There were 35 trials per participant, which 
took approximately 10 minutes to complete. After the completion of the study, all of the participants 
were debriefed as to the nature of the study. 
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Figure 2. An illustration of the experimental procedure presented to the participants at the start of Experiment 1 . 
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2.2 Results 

The results (see Figure 3 ) clearly demonstrate that lemons and prunes differ in terms of their allocation 
to the labels "fast" and "slow," x 2 0) = 40.55,/? < 0.001 : Lemons were significantly more likely to be 
labelled as "fast" x 2 (l) = 9.00, p < 0.01, whereas prunes were significantly more likely to be labelled 
as "slow" x 2 (l) = 34.68,/? < 0.001. In terms of the odds ratios, lemons were twice as likely to be 
labelled as "fast" than as "slow" (respective counts 54, 27), whilst prunes were 4.79 times more likely 
to be labelled as "slow" than as "fast" (67, 14). Boulders were 3.50 times more likely to be labelled 
as "sour" than as "sweet," x 2 (l) = 25.00, p < 0.001 (63, 18). Meanwhile, "heavier" was 8.00 times 
more likely to be labelled as "red" than as "yellow" x 2 (l) = 49.00, p < 0.001 (72, 9). There was no 
association between the colour descriptor chosen ("pink" or "red") for shape (circle, triangle), and the 
material-image (rough, smooth) chosen for shape. 

Participants' confidence ratings were recoded numerically. A "very confident" response was 
assigned a value of 4, while a "very unconfident" response was assigned a value of 0; intermediate rat- 
ings were assigned appropriate intermediate values. In the subsequent analyses, confidence was used 
as the dependent variable. An ANOVA with shape (circle or triangle) as a repeated-measures factor and 
material-image (hard or soft) as a between-participants factor revealed a significant interaction term 
F(l, 158) = 4.94, p < 0.05, and post hoc LSD tests revealed that the circle was more confidently rated 
as "soft" than as "hard" (p < 0.05; respective means 2.61 and 2.14; in terms of count, the number of 
participants rating circles as "soft" was 38 and as "hard" was 43). There was also an interaction in an 
ANOVA with fruit (lemon and prune) and fastness ("slow" and "fast") as factors, F(l, 158) = 9.29, 
p < 0.01, with prunes being more confidently rated by participants as "slow" than as "fast" (p < 0.05), 
whilst lemons were more confidently rated as being "fast" than "slow" (p < 0.05). Finally, "heavier" 



(a) 



3.5 



c 

■■p 

2 2.5- 

0) 



c 

QJ 
"O 
IP 

c 
o 
u 



1.5- 



"lemon" 



t 



i i i i i i i 
i i i i i i i 



fast slow fast slow sweet sour yellow red 



prune 



"boulder" "heavier 




'fast" 



■ 




'lemon' 'prune' 'boulder' 'heavier' 



Figure 3. (a) Bar graph showing participants' confidence ratings when allocating stimuli to the different descriptors 
(error bars 2 SEM). (b) Bar graph showing proportion of different stimuli allocated to the various descriptors 
(**p < 0.01, ***p < 0.001, for participants favouring one response significantly over the other; according to x 2 tests). 
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was more confidently rated as "red" than as "yellow," t(19) = 3.73, p < 0.001. The boulder, likewise, 
was more confidently rated as "sour" than "sweet," t{19) = 2.40,/? < 0.05 (respective means 2.65, 2.17). 

A log-linear analysis was performed, using text ("Bouba" and "Kiki") X shape (blob, rectangle, 
star) X colour (red, white, yellow) X emotion (happy, neutral, sad) as the variables. The four- way 
log-linear analysis produced a final model that retained the text X shape X colour and text X emo- 
tion interactions. The likelihood ratio of this model was x 2 (32) = 14.83,/? = 1.00, indicating that it 
provided a good fit to the data. There was a trend effect for the text X shape X colour interaction, 
X 2 (2) = 8.99, p = 0.062. The interactions between text and shape, x 2 (2) = 276.16,/? < 0.001, and 
between text and emotion, x 2 (2) — 24.09,/? < 0 .001, were significant. In order to break the former 
interaction down, separate x 2 tests were performed with each type of shape to determine whether they 
were assigned as "Bouba" or "Kiki" (Text). Blob x 2 (l) = 67.00,/? < 0.001, rectangles, x 2 (l) = 61.07, 
p < 0.001, and star, x 2 (l) = 143.11,/? < 0.001 were all allocated more to one text descriptor than 
to the other (see Figure 4a ). Odds ratios indicated that the rectangles and blob shapes were 1.81 and 
1.87 times more likely to be rated as "Bouba" than as "Kiki," respectively. By contrast, the star was 
2.59 times more likely to be rated as "Kiki" (than as "Bouba"), explaining the original interaction and 
implying a genuine difference in the descriptor chosen between the star and the other two shapes. No 
lower-order significant factors which interacted significantly in higher order effects are reported. 

The text by emotion interaction was broken down in the same fashion as the previous text and 
shape interaction in order to determine whether there were any significant differences in the assign- 
ment of "Bouba" or "Kiki." This was only true for sad stimuli, x 2 0) = 24.27,/? < 0.001, which were 
1.45 times more likely to be rated as "Bouba" than as "Kiki" (see Figure 5a ). 

A four- way between-participants ANOVA was conducted on the confidence rating data (note that 
one data point was missing from this analysis). The factors consisted of the same independent vari- 
ables as above. The interaction between text selected and shape was significant, F(2, 2,132) = 21.18, 
p < 0.001 (see Figure 4b ). Post hoc LSD tests revealed that the participants were more confident in 
assigning the label "Kiki" than the label "Bouba" to the stars (p < 0.001), and, conversely, the label 
"Bouba" (rather than "Kiki") to the blobs (p < 0.001) and rectangles (p < 0.01). 
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Figure 4. (a) Bar graph showing the proportion of blob, rectangle, and star stimuli being associated with the 
words "Bouba" and "Kiki" (grey and white bars, respectively, where < 0.001) in Experiment 1; (b) line 
graph showing "Bouba" and "Kiki" confidence ratings for the blob, rectangle, and star stimuli (error bars 2 SEM). 
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Figure 5. (a) Bar graph showing the proportion of happy, neutral, and sad stimuli associated with the words 
"Bouba" and "Kiki" in Experiment 1 (where ***p < 0.001). (b) Line graph showing RT (in ms) to assign happy, 
neutral, and sad stimuli as either "Bouba" or "Kiki"; error bars (2 SEM) were calculated for log RT but then 
reverted to normal scale via an inverse log function for ease of presentation. 



The RT data were leptokurtic and positively skewed, and hence, in order to help correct for this, 
were log-transformed (see Field, 2005). Outlying data for RTs to stimuli reported as "Bouba" and 
those reported as "Kiki" were screened separately and corrected for (<1% in both cases; outliers are 
henceforth defined as exceeding ±3 standard deviations surrounding the mean, and henceforth cor- 
rected for by being replaced with the next most extreme but non-outlying data point). For "single con- 
figuration stimuli" ( Figure la ), two two-way ANOVAs were conducted with log RT as the dependent 
variable, a repeated-measures factor of stimulus (circle or triangle), and also a between-participants 
response factor of either pink/red or hard-/soft-material (see the bottom four rows in Figure la ). Four 
independent sample Mests were also conducted in order to see whether stimulus (either "heavier," 
"boulder," "lemon," "prune") were allocated more so to one response or to another (responses were 
respectively "red"/"yellow," "sweet"/"sour," "fast"/"slow" and "fast"/"slow" again; see the top four 
rows in Figure la ). There were no significant effects on log RT between allocations in those trials 
with a circle, triangle, "heavier," and "boulder" as the stimuli. There was a trend towards an effect 
for the "lemon" and "prune" F(l 9 158) = 3.52, p = 0.063, with participants taking more time to 
assign "lemons" than to assign "prunes" (computed averages 3,038 and 2,529 ms, respectively). For 
"parametrically varied" stimuli ( Figure lb ), a four- way between-participants ANOVA was conducted 
with log RT as the dependent variable and shape, colour and emotion as independent variables. The 
only significant factor to emerge from this analysis was emotion, F(2, 2,133) = 5.24, p < 0.01 (see 
Figure 5b ). Post hoc LSD tests revealed that participants responded more slowly to sad stimuli than 
to either happy (p < 0.01) or neutral (p < 0.001) stimuli (average computed RTs were 2,455, 2,256 
and 2,182 ms, respectively). This finding has been reported before (one explanation is that negative 
stimuli might be processed more slowly than positive and neutral stimuli to avoid potentially danger- 
ous errors, see Ihssen & Keil, 2013 ). 

2.3 Discussion 

The results of Experiment 1 can be divided into two parts: The re-testing of already documented 
crossmodal matches, and the testing of new correspondences. Our results establish that boulders are 
associated with sourness while prunes are associated with slowness. By contrast, no significant match- 
ing pattern was found intramodally, that is, between shapes (circle/triangle) and texture (soft/round), 
or shapes and colour (red/pink) — although those participants who rated circles as soft were more con- 
fident of their choice than those who rated them as rough. Regarding those crossmodal matches that 
have already been documented in the literature, our results are consistent with previous or expected 
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results (such as the fastness of lemons, see Smith, 2012 ., and the heaviness of red or the attribution 
of "Kiki" to star-shaped figures). That said, our results also demonstrate that intermediate figures 
(rectangle with rounded corners) are rated as Bouba, and that, in some circumstances, emotional cues 
can override the intuitive associations that exist between shapes and sounds, as sad faces embedded 
in either of the shapes have a tendency to be associated with "Bouba." This result can be explained 
in terms of the semantic hypothesis by asserting that the sound "Bouba" is associated with passive 
stimuli and that sad faces are also more strongly associated with passivity than are shapes. However, 
the same result underscores the risk of the ad hoc application of the semantic hypothesis: if "Bouba" 
is associated with the blobby shape, and the blobby shape can be described as good (or at least as less 
bad) than the spiky star shape, then it is rather surprising that "Bouba" is also associated with sad, i.e., 
negative emotions. No effect on the "Bouba"-"Kiki" matching was documented in the case of colour 
(red/yellow), also creating something of a tension for the semantic hypothesis: If the semantic dimen- 
sions attributed to "Kiki" are active and/or bad (explaining the matching with angularity), then one 
would expect a match with "red" which is often recognized as alerting or enlivening (Elliot & Maier, 
2007 ; Hogg, 1969 ). 

3 Experiment 2: Crossmodal matches across linguistic borders 

In a follow-up study, we tested whether the previous results could be attributed to cross-culturally 
semantic dimensions that are common to all objects (as posited by the semantic hypothesis) and 
whether they were universal as often assumed in the field of crossmodal correspondences (see Bremner 
et al., 2013 , for a discussion). 

3.1 Methods 

3.1.1 Participants 

Eighty-two individuals (39 female and 43 male) recruited from Amazon's Mechanical Turk took part 
in this study in exchange for a payment of 0.80 US dollars on the April 19, 2013, from 12:00 GMT 
onwards over a four-hour period (see Crump, McDonnell, & Gureckis, 2013 , for a methodological 
overview). The participants ranged in age from 22 to 68 years (mean of 35.3 years). All of the partici- 
pants provided informed consent prior to their taking part in the study. The participants' countries of 
origin are shown in Figure 6 (derived from the IP address via the GeoLite City database; downloaded 
from http://dev.maxmind.com/geoip/legacy/geolite on May 1, 2013). The average time taken to 
complete the study was 240 seconds (standard deviation 78 seconds). 

3.1.2 Stimuli, apparatus, design, and procedure 

This experiment was almost identical to the previous one in terms of the stimuli, design, and proce- 
dure, but excluded verbal instruction of the procedure to the participant by the experimenter (given 
the remote locations in which the participants completed the study). The apparatus varied across par- 
ticipants. Two participants performed the study on Apple computers (version 10.7.5 and 10.6.8) and 
the remainder on machines running Windows (20 participants on Windows XP, 44 on Windows 7 and 
10 on Windows 8). The most common resolution of the participants' monitors was 1,366 X 768 (28 
participants) and the mean resolution was 1,422 X 867 (their respective standard deviations were 243 
and 136). 




Figure 6. World map showing the geo-location of participants' data ©2012 Google, INEGI, MapLink (red circle = 
location of a female participant; blue circle = male participant). 
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3.2 Results 

The same statistical procedures reported in Experiment 1 were also used here. To summarize, there 
was little difference between the results of the two experiments. Once again, the results reported here 
demonstrate that lemons and prunes differed in terms of the allocation to the labels "fast" and "slow," 
X 2 {^) = 11.83,/? < 0.001: lemons were significantly more likely to be labelled as "fast" ^ 2 (1) = 3.95, 
p = 0.05, whereas prunes were significantly more likely to be labelled as "slow" ^ 2 (1) = 8.24, 
p < 0.01. In terms of the odds ratios (see Figure 7 ), lemons were 1.56 times more likely to be labelled 
as "fast" than as "slow" (respective counts 50, 32), whilst prunes were 1.93 times more likely to be 
labelled as "slow" than as "fast" (54, 28). Boulders were 1.73 times more likely to be labelled as 
"sour" than as "sweet," ^ 2 (1) = 5.90,/? < 0.02 (52, 30). Meanwhile, "heavier" was 4.13 times more 
likely to be labelled with the "red" descriptor than with the "yellow" descriptor, ^ 2 (1) = 30.49,/? < 
0.001 (66, 16). In contrast to the results of Experiment 1, there was an association between the colour 
descriptor chosen ("pink" or "red") and shape, with circles being 1.93 times more likely to be labelled 
as red than as pink ^ 2 (1) = 8.24,/? < 0.01 (54, 28). There was no association between the triangle 
and colour, and as with Experiment 1, no association between material-image (rough and smooth) 
and shape. In terms of the confidence analyses, there was only a significant effect of text for boulders 
F(l, 80) = 6.48,/? < 0.02, with stimuli rated as "sour" being more confidently rated than were the 
"sweet" stimuli (means were 2.56 and 2.13, respectively). In terms of log RTs (as before the data were 
corrected via a log transform due to their being leptokurtic and positively skewed), the only significant 
effect was with the two-way ANOVA with text ("red" or "pink") and stimulus (triangle, circle); there 
was an interaction between the factors F(l, 160) = 4.01,/? < 0.05, with triangles being rated more 
quickly as red than as pink (p < 0.05; means of 1,763 ms and 2,403 ms, respectively). 

The log-linear analysis reported in Experiment 1 was also performed here and produced similar 
results. The likelihood ratio of the model was x 2 (44) = 14.24, p = 1.00, and it retained both inter- 
actions of text X shape x 2 (2) = 48.31,/? < 0.001 and text X emotion \\2) = 15.56,/? < 0.001. In 
order to break the former interaction down, separate x 2 tests were performed with each type of shape 
in order to determine whether they were assigned to "Bouba" or to "Kiki" (text). Rectangle, x 2 (l) = 
5.90,/? < 0.05 and star, x 2 (l) = 40.09,/? < 0.001, were allocated more to one text descriptor than to 
the other (see Figure 8a ); blob, x 2 (l) x 3.122,/? < 0.077, did so at trend level. Odds ratios indicated 
that the rectangle and blob shapes were 1.20 and 1.14 times more likely to be rated as "Bouba" than 
as "Kiki," respectively. By contrast, the star was 1.61 times more likely to be rated as "Kiki" (than 
as "Bouba"), thus explaining the original interaction, and implying a genuine difference in chosen 
descriptor between the star and the other two shapes. 

The interaction between text and emotion was broken down in the same fashion, in order to deter- 
mine whether there were any significant differences in the assignment of "Bouba" or "Kiki." Happy 
stimuli x 2 0) = 7.42,/? < 0.01 were 1.22 times more likely to be rated as "Kiki" than as "Bouba," 
whilst an opposite trend was observed for the sad stimuli, x 2 (l) = 5.9, p < 0.02, which were 1.20 
times more likely to be rated as "Bouba" than as "Kiki" (see Figure 9 ). The neutral stimuli were 1.14 




Figure 7. Bar graph showing the proportion of different stimuli to descriptors in Experiment 2 (where *p < 0.05; 
**p < 0.01; and ***/? < 0.001). 
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Stimulus shape 

Figure 8. (a) Bar graph highlighting the proportion of blob, rectangle, and star stimuli allocated to the words 
"Bouba" and "Kiki" (grey and white bars, respectively) in Experiment 2 (where *p < 0.05 and **p < 0.01); (b) 
line graph showing "Bouba" and "Kiki" confidence ratings for blob, rectangle, and star stimuli (error bars 2 SEM). 



times more likely to be rated as "Kiki" than as "Bouba," although this trend just failed to reach statisti- 
cal significance x 2 (l) = 3.39,/? = 0.066. 

A four- way between-participants ANOVA was conducted on the confidence rating data. The fac- 
tors consisted of the same independent variables as above. Although the four- way interaction was sig- 
nificant, F(8, 2,160) = 3.01 1 p < 0.01, the exploration of this interaction will not be reported here as it 
does not aid in the testing of our hypotheses. The interaction between shape and text was significant, 
F(2, 2,160) = 6.13, p < 0.005 (see Figure 8b ). Post hoc LSD tests revealed that the participants were 
more confident in assigning the label "Kiki" than the label "Bouba" to the stars (p < 0.001). There 
was, however, no difference in terms of our participants' confidence when it came to assigning blobs 
or rectangles to the "Kiki" and "Bouba" responses, respectively. 

A four-way between-participants ANOVA was conducted with log RT as the dependent variable 
and with the same independent variables as described in Experiment 1 (outlying data for RTs to stimuli 
reported as "Bouba" and those reported as "Kiki" were screened separately and corrected for; <2% 
in both cases). The only significant factor to emerge from this analysis was shape, F(2, 2,160) = 3.02, 
p < 0.05. Post hoc LSD tests revealed that participants responded more quickly to blob stimuli than to 



The appropriate interpretation of this four-way interaction was determined by running a number of separate 
ANOVA for each shape with the remaining factors. For the star, only the main effect of text was significant, 
F(l, 720) = 18.26,/? < 0.001), with stimuli rated more confidently as "Kiki" than "Bouba" (means of 2.51 versus 
2.23, respectively). For the blobby shape, the three-way interaction between colour, emotion, and text was signifi- 
cant F(4, 720) = 4.30, p < 0.01. Separate ANOVAs were conducted for each blob colour (collapsing over emo- 
tion), and for each blob emotion (collapsing over colour); there were no significant effects from the blob emotion 
ANOVAs. In the blob colour ANOVAs, however, the emotion X text interaction was significant for both white 
and yellow blobs, F(2, 240) = 6.10, p < 0.01, andF(2, 240) = 3.44, p < 0.05. For white blobs (happy, neutral 
and sad mean ratings for white "Bouba" were 2.11,2.71,2.31, respectively, and for white "Kiki" were 2.43, 2.11, 
2.35), neutral emotion "Bouba" confidence ratings were both larger than neutral "Kiki" ratings (p < 0.01) as well 
as happy (p < 0.01) and sad (p < 0.05) "Bouba". For yellow blobs ratings (happy, neutral and sad mean ratings 
for yellow "Bouba" were 2.45, 2.18, 2.32, respectively, and for yellow 'Kiki' 2.63, 2.68, 2.13), neutral emotion 
"Kiki" confidence ratings were both larger than "Bouba" ratings for neutral emotion stimuli (p < 0.01) as well 
as sad "Kiki" and "Bouba" ratings (p < 0.001, p = 0.05, respectively). Yellow neutral emotion "Bouba" ratings 
were rated lower than happy "Kiki" ratings (p < 0.02). Finally, happy "Kiki" ratings were higher than sad "Kiki" 
ratings (p < 0.02). There were no significant effects for red blobs. 
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Figure 9. Bar graph showing the proportion of happy, neutral, and sad stimuli associated with the words "Bouba" 
and "Kiki" in Experiment 2 (where *p < 0.05 and **p < 0.01). 



either rectangle (p < 0.05) or star (p < 0.02) stimuli (average computed RTs were 2,173, 2,345, and 
2375 ms, respectively). 

3.3 Discussion 

The results of Experiment 2 confirm the cross-cultural existence of the crossmodal matchings that 
were originally documented in Experiment 1, although the cross-cultural aspect is tempered by the 
high proportion of North American participants relative to those from other locations. This result 
therefore confirms that people really do tend to associate boulders with "sour," "heavy" and "red," and 
converges on the conclusion that lemons are fast while prunes are slow. The distribution of responses 
was also similar, with convergence being more important for the heaviness of the colour red and the 
slowness of prunes than for the sourness of boulders and the speed of lemons. However, we also 
observed an intramodal correspondence between circles and red, which raises questions about the 
absence of intramodal matching in the previous set of participants. The results on the "Bouba"-"Kiki" 
matchings were also broadly similar to those obtained in Experiment 1 (and which have already been 
documented recently in a remote culture; see Bremner et al., 2013 , for a cross-cultural study of the 
"Bouba"-"Kiki" effect), while showing some minor differences regarding the effects of emotional 
cues (here, both sad and happy faces were more often associated with "Kiki" than "Bouba"), thus sug- 
gesting that the effect of emotion should be more thoroughly investigated in future research. 



4 General discussion 

The results of the two experiments reported in the present study demonstrate the potential power of 
internet-based testing methods for those wanting to assess a variety of different crossmodal corre- 
spondences. Despite participants completing Experiment 2 unsupervised on a wide variety of hard- 
ware in various testing environments, the results of both experiments were satisfyingly congruent with 
one another (as also reported by Crump et al., 2013 ; Germine et al., 2012 ). The speed of data collection 
in Experiment 2 was in addition remarkably fast (82 participants in 4 hours) and very cost effective, 
and indeed perhaps "revolutionary" (Crump et al., 2013 ). Conceivably, the limiting factors for online 
research are now the limited number of sensory modalities that can be stimulated and the restricted 
range of sensors that can be used to collect data— smartphones do have an edge over computers in this 
regard, indicating the direction of the potential next "revolutionary" step in data collection (Miller, 
2012 ). Our results also confirm the fact that, as intuited by Peter Walker (and others, Deroy, 2011 ; 
Smith, 2012 ), most people do indeed consider lemons to be fast. One of the novel findings to emerge 
from the results of the present study was that most people also associate prunes with slowness, an 
association that, as far as we are aware, has not been documented previously. The experimental tech- 
nique used here allowed us to distinguish between the inter- individual (overall likelihood of matching 
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across all trials) and the intra-individual (confidence ratings plus RT) intuitive appeal of crossmodal 
correspondences, and to check the cross-cultural validity of the results. 

According to the semantic hypothesis, surprising associations between sensory features or dimen- 
sions arise because the information available to different modalities is recoded into a more abstract, 
semantic format. The cross-cultural similarities are, in themselves, important to question the link 
between this "semantic" recoding and linguistic representations (as suggested for instance by Mar- 
tino & Marks, 1999 ): In the case of thickness and pitch, the association seems to exist — at least, to 
introduce an interference in a perceptual task — only for Farsi participants who describe pitches as thin 
(na-zok) or thick (koloft) (see Dolscheid, Shay an, Majid, & Casasanto, 2013 ). Here, we found that 
intuitive crossmodal associations were widespread across cultural and linguistic groups. 

Instead of language, it has been suggested that crossmodal associations are governed by general 
dimensions shared or "connoted" by the perceptual concepts applied to the stimuli (Walker & Walker, 
2012 ). According to the semantic differential technique, as introduced by Osgood et al. ( 1957 ; Snider 
& Osgood, 1969 ), more than half a century ago, many concepts can be analysed (and then matched) in 
terms of dimensions anchored by pairs of polar adjectives; with the most commonly retained dimen- 
sions being active-passive, good-bad, and dominant-submissive (e.g., Proctor & Cho, 2006 ). Accord- 
ing to this approach, for instance, angularity and bitterness are matched because both angular and 
bitter fall on the same "bad" end of the bad/good dimension (i.e., they are both potentially dangerous 
kinds of stimuli; Bar, 2007 , 2011 ; Bar & Neta, 2006 ). 

In answer to the question posed in the title of Brown's ( 1958 ) early paper (and reprinted in Snider 
& Osgood, 1969 ), there really is a crossmodal association (or correspondence) between boulders and 
sourness — ironically one for which there is even more inter-individual agreement than the success- 
ful "fast lemons" originally proposed by Peter Walker. That said, these crossmodal associations are 
weaker than the weight of hues, as assessed here (and borrowed from the research of Alexander & 
Shansky, 1976 ). The existence of a correspondence between boulders and sourness is at first quite 
supportive of the semantic hypothesis, as it seems not to fail on this example. The fact is that we 
presumably cannot rely on an individual's intuitions as necessarily reflecting a consensus (as Koriat, 
2008 , suggested for sound-symbolic associations between foreign words and their referents). Instead, 
we need to validate each intuition with experimentation, and this is where a fast, non-expensive and 
far-reaching new technique like the internet-based testing shows its strength. 

Still, the existence of an intuitive matching between boulders and sourness reinforces a pre-existing 
challenge for the hypothesis, as one needs to explain how it is that qualitative, metathetic dimensions 
(e.g., of hue and taste quality; Spence, 2011 ) or natural objects with variable features (boulders) are 
assigned a specific polarity: Why, for example, should yellow, lemons and boulders all be more active? 

We also examined a rarely discussed aspect of correspondences, that is, the presence of intramodal 
correspondences. The results reported here suggest that there may not be a systematic intramodal cor- 
respondence between shape and texture (soft/rough) although one would expect that both round and 
soft, and triangle and rough would be given the same polarity regarding certain semantic dimensions 
("good" for the former, and "active" for the latter). There was also no systematic correspondence 
between colour and shape (except for the matching of roundness with redness in Experiment 2). This 
finding challenges Kandinsky's ( 1925 ) early claim that many instances of such matches exist. 

In testing the "Bouba"/"Kiki" effect, we were also able to confirm that making the angularity/ 
roundness contrast more salient (by using a star with many angles and a square with fewer) affected 
the intra-individual confidence of participants in the matching. Interestingly, the prediction generated 
by a consideration of the semantic differential technique (Osgood et al., 1957 ), that adding cues of sad- 
ness to the shape-cues of passivity or cues of happiness to shape-cues of activity would influence the 
matching between sounds and visual images, was not observed. What is more, the effect of emotional 
faces on matching also differed as a function of the regions of the world in which the participants 
were located, but were strong enough to substantially influence the performance of participants on the 
matching task in Experiment 1 , with a sad schematic line drawing of a face overriding shape cues in 
the association with "Bouba." This result is, however, insufficient to refute the explanations of cross- 
modal correspondences in terms of the semantic differential technique — as it is possible to construct 
an explanation in terms of another shared dimension (say, for example, that both "Bouba " and "sad 
face" are passive). However, this result certainly points to the need for those who support the explana- 
tory power of the semantic hypothesis to state precisely which dimension prevails and why, in a given 
situation. 
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More fundamentally, these results show the need to question whether the dimensions singled out 
by Osgood et al. ( 1957 ) should be the ones (or the only ones) used when explaining crossmodal cor- 
respondences. In this sense, Martino and Marks's ( 200 1 , p. 64) idea of an "abstract semantic network 
that captures synesthetic correspondences " more carefully avoids mentioning the dimensions inher- 
ited from work on the semantic differential technique, but could still be criticized for failing to gener- 
ate precise refutable hypotheses (Piatt, 1964 ). 

5 Conclusions 

Over the years, many researchers have been eager to explain surprising crossmodal matches in terms 
of a form of semantic hypothesis (e.g., Martino & Marks, 1999 , 2001 ; Walker & Walker, 2012 ; see also 
Proctor & Cho, 2006 ): The idea is that people match sensory stimuli across different sensory modali- 
ties because they connote the same pole on a set of fundamental dimensions, such as active/passive or 
good/bad. Evaluating this hypothesis empirically is difficult and will certainly require examples used in 
its defence or against it to be robustly tested. Evaluating the hypothesis will also require researchers to 
test a number of predictions that follow on from the hypothesis. Here, we have pointed to four difficul- 
ties: The semantic hypothesis does not explain differences in frequency — or individual confidence — 
which exist between crossmodal associations built on the same model (fast lemons, slow prunes, sour 
boulders, and heavy red); it does not explain why certain qualitative dimensions get assigned to a 
specific polarity in the scaling; it only minimally predicts what happens when the connotations that are 
supposed to drive the associations are made more salient; finally, it does not explain why the matches 
are no longer observed once intramodal dimensions with congruent connotations are presented. Here, 
we would like to suggest that this does not mean that the semantic hypothesis does not account for parts 
of the reflective processes by which people rationalize their intuitive sense of crossmodal congruency, 
but that it lowers its ability to explain what determines these intuitions in the first place. In this respect, 
the use of internet-based testing protocols, which is now starting to become more widely used across 
various areas of psychological research (see Germine et al., 2012 , for a recent review) will turn out to 
be crucial in reaching out to a more varied array of potential participants than those most commonly 
used in psychological experiments currently (i.e., those coming from Western, Educated, Industrialized, 
Rich, and Democratic societies — or WEIRD, see Henrich, Heine, & Norenzayan, 2010 ). 
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